Aim

pMT06 was transfected into various cell types and mRNA barcode counts were sequenced together with pMT06 pDNA counts. In four different sequencing runs, data for all P53 & GR reporters was collected, and will be analyzed here.


Setup

Libraries

Functions

Loading data

Creating count data frames

Investigate variance of the read counts between individual samples

It looks like the the most variance is between the sequencing runs itself. But this makes sense because I used two different pDNA libraries for the 4 sequencing runs. Some samples correlate with pDNA counts - these samples have pDNA contamination and need to be removed.


Read distribution


Read distribution per cutoff


pDNA-cDNA correlation


Replicate correlation

Stuff learned from above figures:
Some samples can be excluded from further analysis because they don’t contain useful information, these samples are:
MCF7-KO-DMSO: rep2_seq1, rep1_seq1(?), r2_seq2(?), r1_seq3, r3_seq3
MCF7-KO-Nutlin: rep2_seq1, rep3_seq1(?), r1_seq2, r1_seq3, r3_seq3
MCF7-WT-DMSO: rep2_seq1, rep3_seq1
MCF7-WT-Nutlin: rep3_seq1, r2_seq2, r1_seq3
A549_DMSO: r2_seq3, r3_seq3
A549_Dex10: r2_seq3
A549_Dex100: r2_seq3
A549-Dex-1: r1_seq2, r2_seq3
mES-N2B27-HQ: rep1_seq1
mES-N2B27-RA: rep1_seq1, rep2_seq1


Annotation of the reporters


Scaling the data

# It might be hard to compare the KO with the WT barcode counts because in th KO we don't really have active elements -> 'inactive' elements will take up most of the reads
## How can we scale the different conditions correctly?
ggplot(bc_df %>% 
         filter(neg_ctrls == "Yes", str_detect(tf, "53"), str_detect(sample, "MCF")) %>% 
         dplyr::select(tf, sample, rpm, reporter_id, gcf) %>% 
         unique(), 
       aes(x = sample, y = rpm, color = tf)) + 
  geom_quasirandom(dodge.width = 0.75) + 
  scale_color_brewer(palette = "Dark2") + 
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  facet_wrap(~gcf)

bc_df_scale <- bc_df %>%
  filter(neg_ctrls == "Yes", str_detect(tf, "53")) %>% 
  dplyr::select(sample, rpm, reporter_id, gcf) %>%
  unique() %>% 
  group_by(sample) %>%
  mutate(rpm = mean(rpm)) %>%
  ungroup() %>%
  dplyr::select(-reporter_id) %>%
  unique() %>%
  dplyr::select("background_rpm" = rpm, sample) %>%
  unique() %>%
  filter(str_detect(sample, "pDNA", negate = T))

bc_df <- merge(bc_df, bc_df_scale, all = T)

bc_df <- bc_df %>%
  mutate(rpm_norm = rpm / background_rpm)

ggplot(bc_df %>% 
         filter(neg_ctrls == "Yes", str_detect(tf, "53"), str_detect(sample, "MCF")) %>% 
         dplyr::select(tf, sample, rpm_norm, reporter_id, gcf) %>% 
         unique(), 
       aes(x = sample, y = rpm_norm, color = tf)) + 
  geom_quasirandom(dodge.width = 0.75) + 
  scale_color_brewer(palette = "Dark2") + 
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  facet_wrap(~gcf)

# looks quite ok, MCF7_WT_DMSO_r1_gcf6412 has some outliers - maybe I need to remove this sample

Normalization of barcode counts:

Divide cDNA barcode counts through pDNA barcode counts


Characterize reporter activities


Reporter activity correlations


Removing outliers


Technical replicate correlations


Export data


Session Info

paste("Run time: ",format(Sys.time()-StartTime))
## [1] "Run time:  23.01659 mins"
getwd()
## [1] "/DATA/usr/m.trauernicht/projects/SuRE_deep_scan_trp53_gr/analyses"
date()
## [1] "Wed Jul 21 15:03:27 2021"
sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] stats4    grid      parallel  stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] PCAtools_2.2.0              ggrepel_0.9.1              
##  [3] DESeq2_1.30.1               SummarizedExperiment_1.20.0
##  [5] Biobase_2.50.0              MatrixGenerics_1.2.1       
##  [7] matrixStats_0.59.0          GenomicRanges_1.42.0       
##  [9] GenomeInfoDb_1.26.7         IRanges_2.24.1             
## [11] S4Vectors_0.28.1            BiocGenerics_0.36.1        
## [13] tidyr_1.1.3                 viridis_0.6.1              
## [15] viridisLite_0.4.0           ggpointdensity_0.1.0       
## [17] ggbiplot_0.55               scales_1.1.1               
## [19] factoextra_1.0.7            shiny_1.6.0                
## [21] pheatmap_1.0.12             gridExtra_2.3              
## [23] RColorBrewer_1.1-2          readr_1.4.0                
## [25] haven_2.4.1                 ggbeeswarm_0.6.0           
## [27] plotly_4.9.4.1              tibble_3.1.2               
## [29] dplyr_1.0.7                 vwr_0.3.0                  
## [31] latticeExtra_0.6-29         lattice_0.20-41            
## [33] stringdist_0.9.6.3          GGally_2.1.2               
## [35] ggpubr_0.4.0                ggplot2_3.3.5              
## [37] stringr_1.4.0               plyr_1.8.6                 
## [39] data.table_1.14.0          
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1              backports_1.2.1          
##   [3] lazyeval_0.2.2            splines_4.0.5            
##   [5] crosstalk_1.1.1           BiocParallel_1.24.1      
##   [7] digest_0.6.27             htmltools_0.5.1.1        
##   [9] fansi_0.5.0               magrittr_2.0.1           
##  [11] memoise_2.0.0             openxlsx_4.2.4           
##  [13] annotate_1.68.0           jpeg_0.1-8.1             
##  [15] colorspace_2.0-2          blob_1.2.1               
##  [17] xfun_0.24                 crayon_1.4.1             
##  [19] RCurl_1.98-1.3            jsonlite_1.7.2           
##  [21] genefilter_1.72.1         survival_3.2-10          
##  [23] glue_1.4.2                gtable_0.3.0             
##  [25] zlibbioc_1.36.0           XVector_0.30.0           
##  [27] DelayedArray_0.16.3       car_3.0-11               
##  [29] BiocSingular_1.6.0        abind_1.4-5              
##  [31] DBI_1.1.1                 rstatix_0.7.0            
##  [33] Rcpp_1.0.7                xtable_1.8-4             
##  [35] dqrng_0.3.0               foreign_0.8-81           
##  [37] bit_4.0.4                 rsvd_1.0.5               
##  [39] htmlwidgets_1.5.3         httr_1.4.2               
##  [41] ellipsis_0.3.2            pkgconfig_2.0.3          
##  [43] reshape_0.8.8             XML_3.99-0.6             
##  [45] farver_2.1.0              sass_0.4.0               
##  [47] locfit_1.5-9.4            utf8_1.2.1               
##  [49] tidyselect_1.1.1          labeling_0.4.2           
##  [51] rlang_0.4.11              reshape2_1.4.4           
##  [53] later_1.2.0               AnnotationDbi_1.52.0     
##  [55] munsell_0.5.0             cellranger_1.1.0         
##  [57] tools_4.0.5               cachem_1.0.5             
##  [59] cli_3.0.0                 generics_0.1.0           
##  [61] RSQLite_2.2.7             broom_0.7.8              
##  [63] evaluate_0.14             fastmap_1.1.0            
##  [65] yaml_2.2.1                knitr_1.33               
##  [67] bit64_4.0.5               zip_2.2.0                
##  [69] purrr_0.3.4               sparseMatrixStats_1.2.1  
##  [71] mime_0.11                 rstudioapi_0.13          
##  [73] compiler_4.0.5            beeswarm_0.4.0           
##  [75] curl_4.3.2                png_0.1-7                
##  [77] ggsignif_0.6.2            geneplotter_1.68.0       
##  [79] bslib_0.2.5.1             stringi_1.7.2            
##  [81] highr_0.9                 forcats_0.5.1            
##  [83] Matrix_1.3-2              vctrs_0.3.8              
##  [85] pillar_1.6.1              lifecycle_1.0.0          
##  [87] jquerylib_0.1.4           cowplot_1.1.1            
##  [89] bitops_1.0-7              irlba_2.3.3              
##  [91] httpuv_1.6.1              R6_2.5.0                 
##  [93] promises_1.2.0.1          rio_0.5.27               
##  [95] vipor_0.4.5               assertthat_0.2.1         
##  [97] withr_2.4.2               GenomeInfoDbData_1.2.4   
##  [99] hms_1.1.0                 beachmat_2.6.4           
## [101] rmarkdown_2.9             DelayedMatrixStats_1.12.3
## [103] carData_3.0-4